{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {
    "vscode": {
     "languageId": "raw"
    }
   },
   "source": [
    "# Building Detection from Aerial Imagery and LiDAR Data\n",
    "\n",
    "[![image](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/opengeos/geoai/blob/main/docs/examples/building_detection_lidar.ipynb)\n",
    "\n",
    "This notebook demonstrates how to train semantic segmentation models for building detection from [NAIP aerial imagery](https://planetarycomputer.microsoft.com/dataset/naip) and [height above ground (HAG)](https://planetarycomputer.microsoft.com/dataset/3dep-lidar-hag) data derived from LiDAR data with just a few lines of code. You can adapt this notebook to segment other objects of interest (such as trees, cars, etc.) from aerial imagery and LiDAR data.\n",
    "\n",
    "## Install packages\n",
    "\n",
    "To use the new functionality, ensure the required packages are installed."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# %pip install geoai-py"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "vscode": {
     "languageId": "raw"
    }
   },
   "source": [
    "## Import libraries"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import os\n",
    "import geoai"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "vscode": {
     "languageId": "raw"
    }
   },
   "source": [
    "## Download sample data\n",
    "\n",
    "We'll use the same dataset as the Mask R-CNN example for consistency."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "train_aerial_url = \"https://huggingface.co/datasets/giswqs/geospatial/resolve/main/las_vegas_train_naip.tif\"\n",
    "train_LiDAR_url = \"https://huggingface.co/datasets/giswqs/geospatial/resolve/main/las_vegas_train_hag.tif\"\n",
    "train_building_url = \"https://huggingface.co/datasets/giswqs/geospatial/resolve/main/las_vegas_buildings_train.geojson\"\n",
    "test_aerial_url = \"https://huggingface.co/datasets/giswqs/geospatial/resolve/main/las_vegas_test_naip.tif\"\n",
    "test_LiDAR_url = \"https://huggingface.co/datasets/giswqs/geospatial/resolve/main/las_vegas_test_hag.tif\""
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "train_aerial_path = geoai.download_file(train_aerial_url)\n",
    "train_LiDAR_path = geoai.download_file(train_LiDAR_url)\n",
    "train_building_path = geoai.download_file(train_building_url)\n",
    "test_aerial_path = geoai.download_file(test_aerial_url)\n",
    "test_LiDAR_path = geoai.download_file(test_LiDAR_url)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "vscode": {
     "languageId": "raw"
    }
   },
   "source": [
    "## Visualize sample data\n",
    "\n",
    "Visualize the building footprints with the aerial imagery."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "os.environ[\"TITILER_ENDPOINT\"] = \"https://giswqs-titiler-endpoint.hf.space\""
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "geoai.view_vector_interactive(train_building_path, tiles=train_aerial_url)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Visualize the building footprints with the height above ground (HAG) data derived from LiDAR data."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "geoai.view_vector_interactive(train_building_path, tiles=train_LiDAR_url)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Stack bands\n",
    "\n",
    "Stack the NAIP and HAG bands into a single image."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "train_raster_path = \"las_vegas_train_naip_hag.tif\"\n",
    "geoai.stack_bands(\n",
    "    input_files=[train_aerial_path, train_LiDAR_path],\n",
    "    output_file=train_raster_path,\n",
    "    resolution=None,  # Automatically inferred from first image\n",
    "    overwrite=True,\n",
    "    dtype=\"Byte\",  # or \"UInt16\", \"Float32\"\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "test_raster_path = \"las_vegas_test_naip_hag.tif\"\n",
    "geoai.stack_bands(\n",
    "    input_files=[test_aerial_path, test_LiDAR_path],\n",
    "    output_file=test_raster_path,\n",
    "    resolution=None,  # Automatically inferred from first image\n",
    "    overwrite=True,\n",
    "    dtype=\"Byte\",  # or \"UInt16\", \"Float32\"\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "vscode": {
     "languageId": "raw"
    }
   },
   "source": [
    "## Create training data\n",
    "\n",
    "We'll create the same training tiles as before."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "out_folder = \"buildings\"\n",
    "tiles = geoai.export_geotiff_tiles(\n",
    "    in_raster=train_raster_path,\n",
    "    out_folder=out_folder,\n",
    "    in_class_data=train_building_path,\n",
    "    tile_size=512,\n",
    "    stride=256,\n",
    "    buffer_radius=0,\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "vscode": {
     "languageId": "raw"
    }
   },
   "source": [
    "## Train semantic segmentation model\n",
    "\n",
    "Now we'll train a semantic segmentation model using the new `train_segmentation_model` function. This function supports various architectures from `segmentation-models-pytorch`:\n",
    "\n",
    "- **Architectures**: `unet`, `unetplusplus` `deeplabv3`, `deeplabv3plus`, `fpn`, `pspnet`, `linknet`, `manet`\n",
    "- **Encoders**: `resnet34`, `resnet50`, `efficientnet-b0`, `mobilenet_v2`, etc.\n",
    "\n",
    "For more details, please refer to the [segmentation-models-pytorch documentation](https://smp.readthedocs.io/en/latest/models.html).\n",
    "\n",
    "Let's train a U-Net with ResNet34 encoder\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Train U-Net model\n",
    "geoai.train_segmentation_model(\n",
    "    images_dir=f\"{out_folder}/images\",\n",
    "    labels_dir=f\"{out_folder}/labels\",\n",
    "    output_dir=f\"{out_folder}/unet_models\",\n",
    "    architecture=\"unet\",\n",
    "    encoder_name=\"resnet34\",\n",
    "    encoder_weights=\"imagenet\",\n",
    "    num_channels=5,\n",
    "    num_classes=2,  # background and building\n",
    "    batch_size=8,\n",
    "    num_epochs=50,\n",
    "    learning_rate=0.001,\n",
    "    val_split=0.2,\n",
    "    verbose=True,\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Evaluate the model"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "geoai.plot_performance_metrics(\n",
    "    history_path=f\"{out_folder}/unet_models/training_history.pth\",\n",
    "    figsize=(15, 5),\n",
    "    verbose=True,\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "vscode": {
     "languageId": "raw"
    }
   },
   "source": [
    "## Run inference\n",
    "\n",
    "Now we'll use the trained model to make predictions on the test image."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Define paths\n",
    "masks_path = \"building_masks.tif\"\n",
    "model_path = f\"{out_folder}/unet_models/best_model.pth\""
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Run semantic segmentation inference\n",
    "geoai.semantic_segmentation(\n",
    "    input_path=test_raster_path,\n",
    "    output_path=masks_path,\n",
    "    model_path=model_path,\n",
    "    architecture=\"unet\",\n",
    "    encoder_name=\"resnet34\",\n",
    "    num_channels=5,\n",
    "    num_classes=2,\n",
    "    window_size=512,\n",
    "    overlap=256,\n",
    "    batch_size=8,\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "vscode": {
     "languageId": "raw"
    }
   },
   "source": [
    "## Vectorize masks\n",
    "\n",
    "Convert the predicted mask to vector format for better visualization and analysis."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "output_vector_path = \"building_masks.geojson\"\n",
    "gdf = geoai.orthogonalize(masks_path, output_vector_path, epsilon=2)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Add geometric properties"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "gdf_props = geoai.add_geometric_properties(gdf, area_unit=\"m2\", length_unit=\"m\")\n",
    "print(f\"Number of buildings: {len(gdf_props)}\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "vscode": {
     "languageId": "raw"
    }
   },
   "source": [
    "## Visualize results"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "geoai.view_raster(masks_path, nodata=0, basemap=test_aerial_url, backend=\"ipyleaflet\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "geoai.view_vector_interactive(gdf_props, column=\"area_m2\", tiles=test_aerial_url)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "gdf_filtered = gdf_props[(gdf_props[\"area_m2\"] > 50)]\n",
    "print(f\"Number of buildings: {len(gdf_filtered)}\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "geoai.view_vector_interactive(gdf_filtered, column=\"area_m2\", tiles=test_aerial_url)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "geoai.create_split_map(\n",
    "    left_layer=gdf_filtered,\n",
    "    right_layer=test_aerial_url,\n",
    "    left_args={\"style\": {\"color\": \"red\", \"fillOpacity\": 0.2}},\n",
    "    basemap=test_aerial_url,\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Performance Metrics\n",
    "\n",
    "**IoU (Intersection over Union)** and **Dice score** are both popular metrics used to evaluate the similarity between two binary masks—often in image segmentation tasks. While they are related, they are not the same.\n",
    "\n",
    "---\n",
    "\n",
    "### 🔸 **Definitions**\n",
    "\n",
    "#### **IoU (Jaccard Index)**\n",
    "\n",
    "$$\n",
    "\\text{IoU} = \\frac{|A \\cap B|}{|A \\cup B|}\n",
    "$$\n",
    "\n",
    "* Measures the overlap between predicted region $A$ and ground truth region $B$ relative to their union.\n",
    "* Ranges from 0 (no overlap) to 1 (perfect overlap).\n",
    "\n",
    "#### **Dice Score (F1 Score for Sets)**\n",
    "\n",
    "$$\n",
    "\\text{Dice} = \\frac{2|A \\cap B|}{|A| + |B|}\n",
    "$$\n",
    "\n",
    "* Measures the overlap between $A$ and $B$, but gives more weight to the intersection.\n",
    "* Also ranges from 0 to 1.\n",
    "\n",
    "---\n",
    "\n",
    "### 🔸 **Key Differences**\n",
    "\n",
    "| Metric   | Formula                     | Penalizes                      | Sensitivity                      |\n",
    "| -------- | --------------------------- | ------------------------------ | -------------------------------- |\n",
    "| **IoU**  | $\\frac{TP}{TP + FP + FN}$   | FP and FN equally              | Less sensitive to small objects  |\n",
    "| **Dice** | $\\frac{2TP}{2TP + FP + FN}$ | Less harsh on small mismatches | More sensitive to small overlaps |\n",
    "\n",
    "> TP: True Positive, FP: False Positive, FN: False Negative\n",
    "\n",
    "---\n",
    "\n",
    "### 🔸 **Relationship**\n",
    "\n",
    "Dice and IoU are mathematically related:\n",
    "\n",
    "$$\n",
    "\\text{Dice} = \\frac{2 \\cdot \\text{IoU}}{1 + \\text{IoU}} \\quad \\text{or} \\quad \\text{IoU} = \\frac{\\text{Dice}}{2 - \\text{Dice}}\n",
    "$$\n",
    "\n",
    "---\n",
    "\n",
    "### 🔸 **When to Use What**\n",
    "\n",
    "* **IoU**: Common in object detection and semantic segmentation benchmarks (e.g., COCO, Pascal VOC).\n",
    "* **Dice**: Preferred in medical imaging and when class imbalance is an issue, due to its sensitivity to small regions."
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "geo",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.12.2"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}
